PDT-VALLEX: Creating a Large-coverage Valency Lexicon for Treebank Annotation
نویسندگان
چکیده
The valency theory as a part of the theory of Functional Generative Description ([16]) of language meaning has been around for some time ([14]). However, it is for the first time that a large-scale corpus (the Prague Dependency Treebank (PDT, [4]) has been fully annotated with valency information based on this theory, i.e., with fully referenced valency lexicon at each relevant verb, noun or adjective.1 The PDT is a long-term research project, whose main aim is a complex manual annotation of (roughly) a one-million-word part of the Czech National Corpus.2 It is being annotated on three layers. On the lowest, morphological layer the lexical entry (usually represented by a lemma) and values of morphological categories (person, number, tense, gender, voice, aspect, . . . ) are assigned to each word. At the analytical layer, a sentence is represented as a dependency tree. Nodes of the tree represent tokens (i.e. word forms and punctuation marks) as they are found in the original sentence. No node is added or deleted. Edges usually (where it makes sense) represent relation of formal dependency. In addition, an analytical function capturing the type of dependency relation between the child and its parent is added. The highest (or, “deepest” depending on the point of view) layer is the tectogrammatical layer ([7]). It captures the deep (underlying) structure of a sentence. Nodes represent only autosemantic words; synsemantic (i.e., auxiliary) words and punctuation marks are not represented by nodes, they may only affect values of attributes of the autosemantic words which they are attached to. At this layer, several attributes are assigned to each node, one of the most important ones
منابع مشابه
Valency in the Prague Dependency Treebank: Building the Valency Lexicon
In this article we focus on valency, which belongs to the core phenomena being captured in the underlying level of the Prague Dependency Treebank (PDT). We present a summary of the basic principles of the applied theoretical framework including proposals for suitable refinement relevant to NLP. The current status of description of valency behavior of verbs, nouns and adjectives is outlined. We ...
متن کاملBuilding the PDT-Vallex valency lexicon
In our contribution, we relate the development of a richly annotated corpus and a computational valency lexicon. Our valency lexicon, called PDT-Vallex (Hajič et al., 2003) has been created as a “byproduct” of the annotation of the Prague Dependency Treebank (PDT) but it became an important resource for further linguistic research as well as for computational processing of the Czech language. W...
متن کاملBuilding a Bilingual ValLex Using Treebank Token Alignment: First Observations
In this paper we explore the potential and limitations of a concept of building a bilingual valency lexicon based on the alignment of nodes in a parallel treebank. Our aim is to build an electronic Czech↔English Valency Lexicon by collecting equivalences from bilingual treebank data and storing them in two already existing electronic valency lexicons, PDT-VALLEX and Engvallex. For this task a s...
متن کاملInherently Pronominal Verbs in Czech: Description and Conversion Based on Treebank Annotation
This paper describes results of a study related to the PARSEME Shared Task on automatic detection of verbal Multi-Word Expressions (MWEs) which focuses on their identification in running texts in many languages. The Shared Task’s organizers have provided basic annotation guidelines where four basic types of verbal MWEs are defined including some specific subtypes. Czech is among the twenty lang...
متن کاملThe verbal valency in the Prague Dependency Treebank from the annotator's point of view
The core ingredient of the Prague Dependency Treebank (PDT; see Hajič, this volume) „valency“ indicates the capability of lexical units to combine other complementations. The PDT has adopted the concept of the valency theory of the Functional Generative Description (FGD) (see Sgall, 1967, Sgall et al, 1986). The valency theory of the FGD has first been developed for verbs, then also for other p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003